Transferrable Representations for Visual Recognition

نویسنده

Jeff Donahue

چکیده

Transferrable Representations for Visual Recognition by Jeffrey Donahue Doctor of Philosophy in Computer Science University of California, Berkeley Professor Trevor Darrell, Chair The rapid progress in visual recognition capabilities over the past several years can be attributed largely to improvements in generic and transferrable feature representations, particularly learned representations based on convolutional networks (convnets) trained “end-to-end” to predict visual semantics given raw pixel intensity values. In this thesis, we analyze the structure of these convnet representations and their generality and transferability to other tasks and settings. We begin in Chapter 2 by examining the hierarchical semantic structure that naturally emerges in convnet representations from large-scale supervised training, even when this structure is unobserved in the training set. Empirically, the resulting representations generalize surprisingly well to classification in related yet distinct settings. Chapters 3 and 4 showcase the flexibility of convnet-based representations for prediction tasks where the inputs or targets have more complex structure. Chapter 3 focuses on representation transfer to the object detection and semantic segmentation tasks in which objects must be localized within an image, as well as labeled. Chapter 4 augments convnets with recurrent structure to handle recognition problems with sequential inputs (e.g., video activity recognition) or outputs (e.g., image captioning). Across each of these domains, end-to-end fine-tuning of the representation for the target task provides a substantial additional performance benefit. Finally, we address the necessity of label supervision for representation learning. In Chapter 5 we propose an unsupervised learning approach based on generative models, demonstrating that some of the transferrable semantic structure learned by supervised convnets can be learned from images alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Visual Knowledge Transfer via EDA

—We address the problem of visual knowledge adaptation by leveraging labeled patterns from source domain and a very limited number of labeled instances in target domain to learn a robust classifier for visual categorization. This paper proposes a new extreme learning machine based cross-domain network learning framework, that is called Extreme Learning Machine (ELM) based Domain Adaptation (EDA...

متن کامل

Size-Sensitive Perceptual Representations Underlie Visual and Haptic Object Recognition

A variety of similarities between visual and haptic object recognition suggests that the two modalities may share common representations. However, it is unclear whether such common representations preserve low-level perceptual features or whether transfer between vision and haptics is mediated by high-level, abstract representations. Two experiments used a sequential shape-matching task to exam...

متن کامل

Invariant recognition drives neural representations of action sequences

Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of h...

متن کامل

A comparison of local versus global imagedecompositions for visual speechreadingMichael

What is the appropriate spatial scale for image representation? In the primate visual system, receptive elds are small at early stages of processing (area V1), and larger at late stages of processing (areas MT, IT). In the current work, we explore the eeciency of local and global image representations on an automatic visual speech recognition task using an HMM as the recognition system. We comp...

متن کامل

Learning Transferrable Representations for Unsupervised Domain Adaptation

Supervised learning with large scale labelled datasets and deep layered models has caused a paradigm shift in diverse areas in learning and recognition. However, this approach still suffers from generalization issues under the presence of a domain shift between the training and the test data distribution. Since unsupervised domain adaptation algorithms directly address this domain shift problem...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Transferrable Representations for Visual Recognition

نویسنده

چکیده

منابع مشابه

Robust Visual Knowledge Transfer via EDA

Size-Sensitive Perceptual Representations Underlie Visual and Haptic Object Recognition

Invariant recognition drives neural representations of action sequences

A comparison of local versus global imagedecompositions for visual speechreadingMichael

Learning Transferrable Representations for Unsupervised Domain Adaptation

عنوان ژورنال:

اشتراک گذاری